Stratified random sampling from streaming and stored data
نویسندگان
چکیده
منابع مشابه
Variance-Optimal Offline and Streaming Stratified Random Sampling
Stratified random sampling (SRS) is a fundamental sampling technique that provides accurate estimates for aggregate queries using a small size sample, and has been used widely for approximate query processing. A key question in SRS is how to partition a target sample size among different strata. While Neyman allocation provides a solution that minimizes the variance of an estimate using this sa...
متن کاملScalable Simple Random Sampling and Stratified Sampling
Analyzing data sets of billions of records has now become a regular task in many companies and institutions. In the statistical analysis of those massive data sets, sampling generally plays a very important role. In this work, we describe a scalable simple random sampling algorithm, named ScaSRS, which uses probabilistic thresholds to decide on the fly whether to accept, reject, or wait-list an...
متن کاملImproved Exponential Estimator in Stratified Random Sampling
In this article we have considered the problem of estimating the population mean Y in the stratified random sampling using the information of an auxiliary variable x which is correlated with y and suggested improved exponential ratio estimators in the stratified random sampling. The mean square error (MSE) equations for the proposed estimators have been derived and it is shown that the prop...
متن کاملStratified and Un-stratified Sampling in Data Mining: Bagging
Stratified sampling is often used in opinion polls to reduce standard errors, and it is known as variance reduction technique in sampling theory. The most common approach of resampling method is based on bootstrapping the dataset with replacement. A main purpose of this work is to investigate extensions of the resampling methods in classification problems, specifically we use decision trees, fr...
متن کاملInterval Estimation for Small Area Proportions with Small True Proportions from Stratified Random Sampling Survey Data∗†
Consider interval estimation of m small area proportions Pi (i = 1, · · · ,m), where we assume a stratified random sampling design with equal number of observations n in each stratum, and where the domains of interest are the strata. A 100(1 − α)% confidence interval for Pi that has appeared repeatedly in the literature and is used in application is given by P̂ i ± zα/2 √ msei, where P̂ i and mse...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Distributed and Parallel Databases
سال: 2020
ISSN: 0926-8782,1573-7578
DOI: 10.1007/s10619-020-07315-w